Skip to content

feat: v0.1.72#1043

Merged
Henry-811 merged 18 commits intomainfrom
devv0.1.72
Apr 3, 2026
Merged

feat: v0.1.72#1043
Henry-811 merged 18 commits intomainfrom
devv0.1.72

Conversation

@Henry-811
Copy link
Copy Markdown
Collaborator

@Henry-811 Henry-811 commented Apr 2, 2026

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

  • fix: - Bug fixes
  • feat: - New features
  • breaking: - Breaking changes
  • docs: - Documentation updates
  • refactor: - Code refactoring
  • test: - Test additions/modifications
  • chore: - Maintenance tasks
  • perf: - Performance improvements
  • style: - Code style changes
  • ci: - CI/CD configuration changes

Examples:

  • fix: resolve memory leak in data processing
  • feat: add export to CSV functionality
  • breaking: change API response format
  • docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

  • Bug fix (fix:) - Non-breaking change which fixes an issue
  • New feature (feat:) - Non-breaking change which adds functionality
  • Breaking change (breaking:) - Fix or feature that would cause existing functionality to not work as expected
  • Documentation (docs:) - Documentation updates
  • Code refactoring (refactor:) - Code changes that neither fix a bug nor add a feature
  • Tests (test:) - Adding missing tests or correcting existing tests
  • Chore (chore:) - Maintenance tasks, dependency updates, etc.
  • Performance improvement (perf:) - Code changes that improve performance
  • Code style (style:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)
  • CI/CD (ci:) - Changes to CI/CD configuration files and scripts

Checklist

  • I have run pre-commit on my changed files and all checks pass
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

  • Chores

    • Patch release version bumped to 0.1.72.
  • New Features

    • LLM circuit-breaker expanded across multiple backends.
    • Grok backend migrated to the Responses API and now advertises x_search and code_execution.
    • New flags: enable_x_search and enable_code_execution; capabilities expose x_search and combined code_execution.
  • API

    • Provider tool generation and request-parameter handling updated to emit x_search and code_execution and adjust related tooling constraints.
  • Tests

    • Comprehensive circuit-breaker and config plumbing tests added.
  • Documentation

    • Release notes, READMEs, and announcements updated for v0.1.72.

amabito and others added 11 commits March 31, 2026 11:09
Same integration pattern as claude.py:
- Extract llm_circuit_breaker_* kwargs in __init__
- Wrap chat.completions.create with call_with_retry
- Handle CircuitBreakerOpenError separately from context errors
- enabled=false default preserves existing behavior
Wraps _create_response_stream with call_with_retry, covering all
4 call sites. WebSocket transport bypasses CB (direct streaming).
enabled=false default preserves existing behavior.
Gemini keeps its own backoff retry loop; CB gates entry and tracks
outcomes across calls. 503 (overload) recorded as CB failure same
as 429. enabled=false default preserves existing behavior.
38 tests covering ChatCompletions, Response API, and Gemini:
- Config plumbing: CB kwargs accepted, stripped, default disabled
- State transitions via CB instance on each backend
- 429 classification (WAIT/STOP/CAP)
- Gemini 503 handling as CB failure
- Disabled bypass (enabled=false never blocks)
- Config validation (invalid max_failures raises ValueError)
- gemini.py: CircuitBreakerOpenError single-string message (was 2 args)
- tests: use public failure_count property instead of _failure_count
- tests: isinstance assertion instead of is not None
- tests: add integration test for CB OPEN raising error through backend
- Replace asyncio.get_event_loop().run_until_complete with async/await
- Replace vacuous self-raise Gemini test with call_with_retry assertion
- Add Response API integration test for CB OPEN
- All 3 backends now have real call_with_retry integration tests
Codex-verified review found _stream_without_custom_and_mcp_tools
was missing start_api_call_timing before _create_response_stream.
Added timing + end_api_call_timing in CircuitBreakerOpenError handler.
Also added missing reset_time_seconds validation tests for Response
and Gemini backends.
… path

Codex review found two timing leaks exposed by the start_api_call_timing
addition: compression-retry branch and else-raise branch both lacked
end_api_call_timing calls. Now all exit paths properly end timing.
- Add failure_count == 0 assertion to disabled bypass tests
- Add mock_sleep assertion for Retry-After value in WAIT test
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

Integrates an LLMCircuitBreaker into ChatCompletions, Response, and Gemini backends; migrates Grok to the Responses API and adds x_search/code_execution plumbing; updates API-param/tooling and agent config; adds comprehensive circuit-breaker tests; bumps package version to 0.1.72.

Changes

Cohort / File(s) Summary
Version
massgen/__init__.py
Bumped package __version__ from "0.1.71" to "0.1.72".
LLM circuit-breaker integration
massgen/backend/chat_completions.py, massgen/backend/response.py, massgen/backend/gemini.py, massgen/tests/test_llm_cb_backends.py
Add _build_circuit_breaker_config, instantiate self.circuit_breaker, wrap streaming/response HTTP calls with call_with_retry, add gating via should_block()/CircuitBreakerOpenError, and record success/failure. New tests exercise CB behaviors and validation.
Grok / Responses API & tooling
massgen/backend/grok.py, massgen/backend/capabilities.py, massgen/backend/response.py, massgen/api_params_handler/_response_api_params_handler.py, massgen/agent_config.py
GrokBackend now inherits from ResponseBackend; add x_search capability and code_execution builtin tool; propagate enable_x_search/enable_code_execution through agent config and API-param handling; remove legacy extra_body.search_parameters.
Behavior & config validation updates
massgen/chat_agent.py, massgen/config_validator.py, massgen/tests/test_backend_capabilities.py
Expose x_search capability, compute code_execution from enable_x_search OR enable_code_interpreter, validate enable_x_search for Grok only; update capability tests.
Tests & test harness updates
massgen/tests/test_backend_event_loop_all.py, massgen/tests/test_backend_capabilities.py
Adjust test fakes to use responses surface for Grok, update model names and expectations; add capability tests for x_search and code execution.
Configs
massgen/configs/features/trace_analyzer_background.yaml, massgen/configs/providers/others/grok_single_agent.yaml
Switch active agent to Grok with enable_x_search, update provider YAMLs to remove legacy search params and change Grok model/defaults.
Docs & release files
CHANGELOG.md, README.md, README_PYPI.md, ROADMAP.md, docs/...
Add v0.1.72 release notes and announcements; update docs/roadmaps/readmes to reference v0.1.72 and v0.1.73 roadmap targets; add archived v0.1.71 announcement.
Contributor guidance
CONTRIBUTING.md
Advance "next version" entries to v0.1.73 and update branch/PR target instructions.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Backend
    participant CircuitBreaker
    participant LLM_API

    Client->>Backend: start stream / create response (api_params, kwargs)
    Backend->>CircuitBreaker: instantiate / _build_circuit_breaker_config (init)
    Backend->>CircuitBreaker: should_block()
    alt circuit open
        CircuitBreaker-->>Backend: raise CircuitBreakerOpenError
        Backend-->>Client: propagate error (circuit_breaker_open)
    else closed/half-open
        Backend->>CircuitBreaker: call_with_retry(lambda -> LLM_API.call(api_params))
        CircuitBreaker->>LLM_API: invoke API call
        alt success
            LLM_API-->>CircuitBreaker: response
            CircuitBreaker->>CircuitBreaker: record_success()
            CircuitBreaker-->>Backend: return stream/response
            Backend-->>Client: stream response
        else retryable failure
            LLM_API-->>CircuitBreaker: error (e.g., 429/503)
            CircuitBreaker->>CircuitBreaker: decide WAIT/STOP, sleep if needed
            loop retries
                CircuitBreaker->>LLM_API: retry call
            end
            alt retries exhausted or STOP
                CircuitBreaker->>CircuitBreaker: record_failure(reason)
                CircuitBreaker-->>Backend: raise error
                Backend-->>Client: propagate error
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • a5507203
  • ncrispino
🚥 Pre-merge checks | ✅ 1 | ❌ 5

❌ Failed checks (3 warnings, 2 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a template with no actual content filled in. All sections lack concrete information about changes, test methods, expected results, or additional context. Complete the description by providing a brief summary of changes, selecting the appropriate type-of-change checkbox, filling in test commands and expected results, and adding relevant context about the implementation.
Docstring Coverage ⚠️ Warning Docstring coverage is 48.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Documentation Updated ⚠️ Warning PR updates release documentation but lacks required technical documentation for new config options, YAML schema, method docstrings, circuit breaker design docs, and breaking change migration guide. Add user guide documentation for new Grok features, YAML schema reference, Google-style docstrings on new methods, circuit breaker design documentation, and migration guide for backend inheritance change.
Title check ❓ Inconclusive The title 'feat: v0.1.72' follows the required format and indicates a new feature/release, but is overly vague and does not clearly describe the specific changes made in the PR. Replace with a more descriptive title that summarizes key changes, such as 'feat: add Grok Response API backend and extend circuit breaker to ChatCompletions/Gemini' or similar.
Config Parameter Sync ❓ Inconclusive Unable to access repository to verify that both get_base_excluded_config_params() functions were updated with new YAML parameters synchronously. Execute local verification to confirm massgen/backend/base.py and massgen/api_params_handler/api_params_handler_base.py both include enable_x_search, enable_code_execution, and llm_circuit_breaker* parameters.
✅ Passed checks (1 passed)
Check name Status Explanation
Capabilities Registry Check ✅ Passed Backend capabilities have been updated with X_SEARCH and code_execution capabilities, Grok backend configuration includes new supported capabilities and builtin tools, and test file validates these changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devv0.1.72

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

ncrispino and others added 3 commits April 2, 2026 00:37
feat: extend LLM circuit breaker to ChatCompletions, Response API, Gemini (Phase 2)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
massgen/backend/response.py (1)

275-348: ⚠️ Potential issue | 🟠 Major

Run the compressed retry through the same timed stream-creation path.

After the first context-length failure, both branches close the timing window and then call _create_response_stream() again directly. The direct retries at Line 343 and Line 575 never get a fresh start_api_call_timing() and they bypass the new CircuitBreakerOpenError bookkeeping, so success/failure metrics for the compressed request can be wrong.

As per coding guidelines, massgen/backend/**/*.py: Ensure error handling follows existing patterns.

Also applies to: 499-580

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/response.py` around lines 275 - 348, The compressed-retry
path calls _create_response_stream() directly without restarting timing or
reusing the same CircuitBreakerOpenError/metrics handling; update the retry
logic (around _compression_retry handling) to set _compression_retry=True, call
self.start_api_call_timing(model) before invoking _create_response_stream(), and
run the same try/except block used for the original call (including
CircuitBreakerOpenError catching and self.end_api_call_timing(...) on failures)
so the compressed attempt records correct success/failure and circuit-breaker
bookkeeping; ensure to also clear or adjust previous_response_id as you already
do and keep references to _create_response_stream, start_api_call_timing,
end_api_call_timing, CircuitBreakerOpenError, and _compression_retry.
massgen/backend/gemini.py (1)

255-308: ⚠️ Potential issue | 🟠 Major

Don't silently accept unused llm_circuit_breaker_* knobs on Gemini.

This backend builds a full LLMCircuitBreakerConfig, but the actual retry loop still runs off BackoffConfig and only uses the breaker for should_block()/record_*(). Settings like llm_circuit_breaker_retryable_status_codes, llm_circuit_breaker_backoff_multiplier, llm_circuit_breaker_max_backoff_seconds, and llm_circuit_breaker_retry_after_threshold_seconds are therefore accepted but never affect Gemini behavior. Either wire them into the Gemini retry loop or reject unsupported keys up front.

Based on learnings, GeminiBackend intentionally does NOT route stream creation through call_with_retry(), so any shared LLMCircuitBreakerConfig fields that only call_with_retry() honors need explicit Gemini wiring.

Also applies to: 352-367

🧹 Nitpick comments (2)
massgen/backend/chat_completions.py (1)

87-102: Centralize the circuit-breaker kwarg parser.

This exact _build_circuit_breaker_config() logic now lives in massgen/backend/chat_completions.py, massgen/backend/gemini.py, and massgen/backend/response.py. Moving it into a shared base/helper avoids config drift and lets you document the behavior once with the repo's required Google-style docstring.

As per coding guidelines, **/*.py: For new or changed functions, include Google-style docstrings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/chat_completions.py` around lines 87 - 102, The
_build_circuit_breaker_config function in chat_completions.py duplicates logic
present in massgen/backend/gemini.py and massgen/backend/response.py; extract
this kwarg-parsing logic into a single shared helper (e.g., a new function named
build_llm_circuit_breaker_config in a common module like massgen/backend/utils
or a base class) and replace the three copies with calls to that helper; ensure
the new helper accepts kwargs: dict[str, Any], removes parsed keys from the
dict, returns LLMCircuitBreakerConfig(**cb_kwargs), and include a Google-style
docstring describing parameters, return value, and behavior.
massgen/tests/test_llm_cb_backends.py (1)

199-225: Consider mocking time for faster, more reliable tests.

Multiple tests use time.sleep(1.1) to trigger time-based state transitions, adding ~4+ seconds to the test suite. This can cause flakiness on slow CI runners.

Consider using unittest.mock.patch on time.time (or freezegun) to control time progression without actual delays:

♻️ Example refactor using time mock
+from unittest.mock import patch
+
 def test_open_to_half_open_after_reset_time(self):
     cb = self._make_cb(max_failures=1, reset_time_seconds=1)
+    with patch('time.time') as mock_time:
+        mock_time.return_value = 1000.0
+        cb.record_failure()
+        assert cb.state == CircuitState.OPEN
+        
+        # Advance time past reset_time_seconds
+        mock_time.return_value = 1001.1
+        blocked = cb.should_block()
+        assert blocked is False
+        assert cb.state == CircuitState.HALF_OPEN
-    cb.record_failure()
-    assert cb.state == CircuitState.OPEN
-    time.sleep(1.1)
-    # should_block triggers the OPEN -> HALF_OPEN transition
-    blocked = cb.should_block()
-    assert blocked is False
-    assert cb.state == CircuitState.HALF_OPEN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_llm_cb_backends.py` around lines 199 - 225, These tests
use time.sleep(1.1) to trigger the CircuitBreaker OPEN->HALF_OPEN reset window
(tests: test_open_to_half_open_after_reset_time,
test_half_open_to_closed_on_success, test_half_open_to_open_on_failure) which
slows and flakes CI; replace real sleeps by mocking time.time (e.g., with
unittest.mock.patch) to control the clock: create the cb with
reset_time_seconds, call cb.record_failure(), then advance mocked time beyond
reset_time_seconds before calling cb.should_block() to force the OPEN->HALF_OPEN
transition, and similarly advance time for the other tests before invoking
should_block()/record_success()/record_failure(); ensure the mock applies to the
code path used by CircuitBreaker.should_block/record_failure/record_success and
restore the original time after each test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@massgen/backend/chat_completions.py`:
- Around line 307-316: The AsyncOpenAI SDK's built-in retries are currently
enabled and will multiply attempts when wrapped by
circuit_breaker.call_with_retry; disable SDK retries by setting max_retries=0 on
the AsyncOpenAI client at creation (the `AsyncOpenAI` instance assigned to
`client`) or, if you prefer a per-call change, call
`client.with_options(max_retries=0)` inside the `_make_api_call` wrapper before
invoking `client.chat.completions.create` so each circuit-breaker retry
corresponds to a single HTTP request.

In `@massgen/backend/response.py`:
- Around line 1805-1817: The HTTP path uses client.responses.create wrapped by
self.circuit_breaker.call_with_retry in _create_response_stream, but the
AsyncOpenAI client defaults to max_retries=2 causing double-retries; fix by
ensuring the OpenAI client is created with max_retries=0 in _create_client (set
client_kwargs["max_retries"] = 0) or, if you prefer per-call control, call
client.with_options(max_retries=0).responses.create(...) so the
circuit_breaker.call_with_retry handles retries exclusively and the client does
not perform its own retrying.

---

Outside diff comments:
In `@massgen/backend/response.py`:
- Around line 275-348: The compressed-retry path calls _create_response_stream()
directly without restarting timing or reusing the same
CircuitBreakerOpenError/metrics handling; update the retry logic (around
_compression_retry handling) to set _compression_retry=True, call
self.start_api_call_timing(model) before invoking _create_response_stream(), and
run the same try/except block used for the original call (including
CircuitBreakerOpenError catching and self.end_api_call_timing(...) on failures)
so the compressed attempt records correct success/failure and circuit-breaker
bookkeeping; ensure to also clear or adjust previous_response_id as you already
do and keep references to _create_response_stream, start_api_call_timing,
end_api_call_timing, CircuitBreakerOpenError, and _compression_retry.

---

Nitpick comments:
In `@massgen/backend/chat_completions.py`:
- Around line 87-102: The _build_circuit_breaker_config function in
chat_completions.py duplicates logic present in massgen/backend/gemini.py and
massgen/backend/response.py; extract this kwarg-parsing logic into a single
shared helper (e.g., a new function named build_llm_circuit_breaker_config in a
common module like massgen/backend/utils or a base class) and replace the three
copies with calls to that helper; ensure the new helper accepts kwargs:
dict[str, Any], removes parsed keys from the dict, returns
LLMCircuitBreakerConfig(**cb_kwargs), and include a Google-style docstring
describing parameters, return value, and behavior.

In `@massgen/tests/test_llm_cb_backends.py`:
- Around line 199-225: These tests use time.sleep(1.1) to trigger the
CircuitBreaker OPEN->HALF_OPEN reset window (tests:
test_open_to_half_open_after_reset_time, test_half_open_to_closed_on_success,
test_half_open_to_open_on_failure) which slows and flakes CI; replace real
sleeps by mocking time.time (e.g., with unittest.mock.patch) to control the
clock: create the cb with reset_time_seconds, call cb.record_failure(), then
advance mocked time beyond reset_time_seconds before calling cb.should_block()
to force the OPEN->HALF_OPEN transition, and similarly advance time for the
other tests before invoking should_block()/record_success()/record_failure();
ensure the mock applies to the code path used by
CircuitBreaker.should_block/record_failure/record_success and restore the
original time after each test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 790ab125-c036-4f3b-8716-587f8bb7fdd0

📥 Commits

Reviewing files that changed from the base of the PR and between 2aa4178 and 342793a.

📒 Files selected for processing (4)
  • massgen/backend/chat_completions.py
  • massgen/backend/gemini.py
  • massgen/backend/response.py
  • massgen/tests/test_llm_cb_backends.py

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
massgen/agent_config.py (1)

952-962: ⚠️ Potential issue | 🟠 Major

for_computational_task(backend="grok") still builds an OpenAI model by default.

With the current signature, AgentConfig.for_computational_task(backend="grok") passes the function default "gpt-4o" into create_grok_config(). Unless every caller overrides model, the new Grok branch returns an invalid backend/model pairing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/agent_config.py` around lines 952 - 962, for_computational_task
currently uses a single default model ("gpt-4o") and passes it to
create_grok_config when backend=="grok", producing an invalid OpenAI/Grok
pairing; change the logic so the model default is chosen based on backend (e.g.,
set model = "gpt-4o" for openai and model = "<grok-default>" for grok) or make
the signature model: Optional[str]=None and inside for_computational_task assign
a backend-appropriate default before calling create_openai_config or
create_grok_config; update references to for_computational_task,
create_openai_config, and create_grok_config accordingly.
massgen/config_validator.py (1)

615-624: ⚠️ Potential issue | 🔴 Critical

Add enable_x_search to shared exclusion lists in two files.

enable_x_search was added to massgen/config_validator.py, but it must also be added to the get_base_excluded_config_params() method in both:

  • massgen/backend/base.py
  • massgen/api_params_handler/_api_params_handler_base.py

Without this, the parameter will be validated in config_validator but may bypass exclusion and be forwarded as raw kwargs elsewhere, breaking the established parameter-handling pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/config_validator.py` around lines 615 - 624, Add "enable_x_search" to
the shared exclusion lists returned by get_base_excluded_config_params() in both
the BaseBackend class (method get_base_excluded_config_params in
massgen/backend/base.py) and the ApiParamsHandlerBase (method
get_base_excluded_config_params in
massgen/api_params_handler/_api_params_handler_base.py) so the new boolean
parameter is excluded consistently; locate each
get_base_excluded_config_params() implementation and append "enable_x_search" to
the returned list/set of excluded config keys to match
massgen/config_validator.py.
massgen/configs/features/trace_analyzer_background.yaml (1)

4-84: ⚠️ Potential issue | 🟠 Major

This sample is single-agent now, but the orchestrator still expects 3 votes.

After commenting out agent_a and agent_b, only agent_c remains. With voting_sensitivity: checklist_gated and voting_threshold: 3, this config can’t reach its own acceptance threshold and will likely run until timeout instead of converging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/configs/features/trace_analyzer_background.yaml` around lines 4 - 84,
The orchestrator’s voting config (voting_sensitivity: checklist_gated and
voting_threshold: 3) no longer matches the active agents because agent_a and
agent_b were commented out, leaving only agent_c; update the config so
convergence is possible by either restoring agent_a/agent_b or lowering
voting_threshold to 1 (or changing voting_sensitivity to a
single-agent-compatible mode) and ensure references to agent_c remain unchanged
so the orchestrator can reach acceptance with the single remaining agent.
🧹 Nitpick comments (2)
massgen/agent_config.py (1)

677-684: Align create_grok_config() with the registry default model.

This helper still defaults to grok-2-1212, while the backend registry in this PR advertises grok-4.20-0309-reasoning as Grok's default. Keeping two defaults for the same backend will make helper-based configs drift from capability-based configs.

Suggested fix
-        model: str = "grok-2-1212",
+        model: str = "grok-4.20-0309-reasoning",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/agent_config.py` around lines 677 - 684, The helper
create_grok_config currently defaults model="grok-2-1212" which conflicts with
the backend registry's advertised default; update the default parameter in
create_grok_config to model="grok-4.20-0309-reasoning" (and adjust any related
tests or docstrings referencing the old default) so helper-based configs align
with the registry default; locate the function by name create_grok_config to
make this single-parameter change.
massgen/backend/grok.py (1)

16-36: Add Google-style docstrings to the changed methods.

__init__ has none, and the new helper/accessor docstrings are still one-liners. This file’s changed Python functions should use the repo’s Google-style format.

As per coding guidelines, **/*.py: For new or changed functions, include Google-style docstrings

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/grok.py` around lines 16 - 36, The changed methods lack
Google-style docstrings; add Google-style docstrings to __init__,
_reject_legacy_search_parameters, get_provider_name, and
get_supported_builtin_tools describing parameters, returns, and exceptions where
appropriate. For __init__, document api_key: str | None, **kwargs, and class
behavior (sets api_key, calls _reject_legacy_search_parameters, sets base_url);
for _reject_legacy_search_parameters, document kwargs: dict[str, Any] and raise
ValueError when legacy search_parameters found; for get_provider_name and
get_supported_builtin_tools, document return type and brief description of the
returned values. Keep wording concise and follow existing repo Google-style
docstring conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@massgen/backend/capabilities.py`:
- Around line 861-867: The validation is comparing the raw backend_type instead
of the normalized name returned by get_capabilities, causing valid
aliases/casing like "Grok" to be rejected; update the Grok-only branches to use
the normalized backend identifier from the capabilities object returned by
get_capabilities (i.e., use the normalized value on caps rather than
backend_type) wherever you check for Grok support (the enable_x_search block
that references backend_type and the similar block at lines ~895-900), and
ensure the x_search support check still uses caps.supported_capabilities.

In `@massgen/backend/grok.py`:
- Around line 17-19: The constructor currently uses
kwargs.setdefault("base_url", self.XAI_BASE_URL) which leaves falsy values like
None or "" in place; change it to explicitly treat falsy base_url as unset by
checking if not kwargs.get("base_url") and then assigning kwargs["base_url"] =
self.XAI_BASE_URL before calling super().__init__; this ensures
ResponseBackend._create_client() sees the intended XAI_BASE_URL when upstream
passes null/empty values. Ensure the change is made near the calls to
_reject_legacy_search_parameters and super().__init__ and references
XAI_BASE_URL.

---

Outside diff comments:
In `@massgen/agent_config.py`:
- Around line 952-962: for_computational_task currently uses a single default
model ("gpt-4o") and passes it to create_grok_config when backend=="grok",
producing an invalid OpenAI/Grok pairing; change the logic so the model default
is chosen based on backend (e.g., set model = "gpt-4o" for openai and model =
"<grok-default>" for grok) or make the signature model: Optional[str]=None and
inside for_computational_task assign a backend-appropriate default before
calling create_openai_config or create_grok_config; update references to
for_computational_task, create_openai_config, and create_grok_config
accordingly.

In `@massgen/config_validator.py`:
- Around line 615-624: Add "enable_x_search" to the shared exclusion lists
returned by get_base_excluded_config_params() in both the BaseBackend class
(method get_base_excluded_config_params in massgen/backend/base.py) and the
ApiParamsHandlerBase (method get_base_excluded_config_params in
massgen/api_params_handler/_api_params_handler_base.py) so the new boolean
parameter is excluded consistently; locate each
get_base_excluded_config_params() implementation and append "enable_x_search" to
the returned list/set of excluded config keys to match
massgen/config_validator.py.

In `@massgen/configs/features/trace_analyzer_background.yaml`:
- Around line 4-84: The orchestrator’s voting config (voting_sensitivity:
checklist_gated and voting_threshold: 3) no longer matches the active agents
because agent_a and agent_b were commented out, leaving only agent_c; update the
config so convergence is possible by either restoring agent_a/agent_b or
lowering voting_threshold to 1 (or changing voting_sensitivity to a
single-agent-compatible mode) and ensure references to agent_c remain unchanged
so the orchestrator can reach acceptance with the single remaining agent.

---

Nitpick comments:
In `@massgen/agent_config.py`:
- Around line 677-684: The helper create_grok_config currently defaults
model="grok-2-1212" which conflicts with the backend registry's advertised
default; update the default parameter in create_grok_config to
model="grok-4.20-0309-reasoning" (and adjust any related tests or docstrings
referencing the old default) so helper-based configs align with the registry
default; locate the function by name create_grok_config to make this
single-parameter change.

In `@massgen/backend/grok.py`:
- Around line 16-36: The changed methods lack Google-style docstrings; add
Google-style docstrings to __init__, _reject_legacy_search_parameters,
get_provider_name, and get_supported_builtin_tools describing parameters,
returns, and exceptions where appropriate. For __init__, document api_key: str |
None, **kwargs, and class behavior (sets api_key, calls
_reject_legacy_search_parameters, sets base_url); for
_reject_legacy_search_parameters, document kwargs: dict[str, Any] and raise
ValueError when legacy search_parameters found; for get_provider_name and
get_supported_builtin_tools, document return type and brief description of the
returned values. Keep wording concise and follow existing repo Google-style
docstring conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c02f6262-8b5f-4928-8039-336b5131428f

📥 Commits

Reviewing files that changed from the base of the PR and between 342793a and 2cbe795.

📒 Files selected for processing (11)
  • massgen/agent_config.py
  • massgen/api_params_handler/_response_api_params_handler.py
  • massgen/backend/capabilities.py
  • massgen/backend/grok.py
  • massgen/backend/response.py
  • massgen/chat_agent.py
  • massgen/config_validator.py
  • massgen/configs/features/trace_analyzer_background.yaml
  • massgen/configs/providers/others/grok_single_agent.yaml
  • massgen/tests/test_backend_capabilities.py
  • massgen/tests/test_backend_event_loop_all.py
✅ Files skipped from review due to trivial changes (1)
  • massgen/configs/providers/others/grok_single_agent.yaml

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ROADMAP.md (1)

903-903: ⚠️ Potential issue | 🟡 Minor

Update footer date to match header.

The "Last Updated" date at the top of the document (line 7) was changed to "April 3, 2026", but the footer "Last Updated" at line 903 still shows "March 11, 2026". For consistency, both should show the same date.

📅 Proposed fix for date consistency
-**Last Updated:** March 11, 2026
+**Last Updated:** April 3, 2026
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ROADMAP.md` at line 903, Update the footer date string "**Last Updated:**
March 11, 2026" to match the header by changing it to "**Last Updated:** April
3, 2026"; locate the footer occurrence of the literal "**Last Updated:** March
11, 2026" in the document (the header already shows "April 3, 2026") and replace
only that date to keep wording consistent.
🧹 Nitpick comments (1)
docs/announcements/github-release-v0.1.72.md (1)

1-12: Fix heading hierarchy to satisfy MD001 and preserve document outline.

Line 3, Line 6, and Line 12 jump from H1 to H3. Promote these to H2.

Proposed patch
-### 🦎 [Grok Backend Update](https://docs.massgen.ai/en/latest/user_guide/backends.html)
+## 🦎 [Grok Backend Update](https://docs.massgen.ai/en/latest/user_guide/backends.html)
@@
-### ⚡ [Circuit Breaker Phase 2](https://docs.massgen.ai/en/latest/user_guide/backends.html)
+## ⚡ [Circuit Breaker Phase 2](https://docs.massgen.ai/en/latest/user_guide/backends.html)
@@
-### 📖 Getting Started
+## 📖 Getting Started
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/announcements/github-release-v0.1.72.md` around lines 1 - 12, The
document uses H1 for the title "# 🚀 Release Highlights — v0.1.72" but then
jumps to H3 for section headings (e.g., "### 🦎 [Grok Backend Update]", "### ⚡
[Circuit Breaker Phase 2]", and "### 📖 Getting Started"); update those three
headings to H2 (change "###" to "##") so the hierarchy is H1 → H2 and MD001 is
satisfied while preserving the existing heading texts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@ROADMAP.md`:
- Line 903: Update the footer date string "**Last Updated:** March 11, 2026" to
match the header by changing it to "**Last Updated:** April 3, 2026"; locate the
footer occurrence of the literal "**Last Updated:** March 11, 2026" in the
document (the header already shows "April 3, 2026") and replace only that date
to keep wording consistent.

---

Nitpick comments:
In `@docs/announcements/github-release-v0.1.72.md`:
- Around line 1-12: The document uses H1 for the title "# 🚀 Release Highlights
— v0.1.72" but then jumps to H3 for section headings (e.g., "### 🦎 [Grok
Backend Update]", "### ⚡ [Circuit Breaker Phase 2]", and "### 📖 Getting
Started"); update those three headings to H2 (change "###" to "##") so the
hierarchy is H1 → H2 and MD001 is satisfied while preserving the existing
heading texts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10b701cd-a46b-42c7-b078-b98a8d36c36a

📥 Commits

Reviewing files that changed from the base of the PR and between 2cbe795 and 984bf22.

📒 Files selected for processing (12)
  • CHANGELOG.md
  • CONTRIBUTING.md
  • README.md
  • README_PYPI.md
  • ROADMAP.md
  • ROADMAP_v0.1.73.md
  • docs/announcements/archive/v0.1.71.md
  • docs/announcements/current-release.md
  • docs/announcements/github-release-v0.1.71.md
  • docs/announcements/github-release-v0.1.72.md
  • docs/source/index.rst
  • massgen/configs/README.md
💤 Files with no reviewable changes (1)
  • docs/announcements/github-release-v0.1.71.md
✅ Files skipped from review due to trivial changes (8)
  • CONTRIBUTING.md
  • ROADMAP_v0.1.73.md
  • docs/announcements/archive/v0.1.71.md
  • docs/announcements/current-release.md
  • docs/source/index.rst
  • massgen/configs/README.md
  • README_PYPI.md
  • CHANGELOG.md

@Henry-811 Henry-811 merged commit 665005d into main Apr 3, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants